The Design of Pre-Processing Multidimensional Data Based on Component Analysis
نویسندگان
چکیده
Increased implementation of new databases related to multidimensional data involving techniques to support efficient query process, create opportunities for more extensive research. Pre-processing is required because of lack of data attribute values, noisy data, errors, inconsistencies or outliers and differences in coding. Several types of pre-processing based on component analysis will be carried out for cleaning, data integration and transformation, as well as to reduce the dimensions. Component analysis can be done by statistical methods, with the aim to separate the various sources of data into a statistical pattern independent. This paper aims to improve the quality of pre-processed data based on component analysis. RapidMiner is used for data pre-processing using FastICA algorithm. Kernel K-mean is used to cluster the pre-processed data and Expectation Maximization (EM) is used to model. The model was tested using wisconsin breast cancer datasets, lung cancer datasets and prostate cancer datasets. The result shows that the performance of the cluster vector value is higher and the processing time is shorter.
منابع مشابه
Analysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases
Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...
متن کاملMonitoring and assessment of a eutrophicated coastal lake using multivariate approaches
Multivariate statistical techniques such as cluster analysis, multidimensional scaling and principal component analysis were applied to evaluate the temporal and spatial variations in water quality data set generated for two years (2008-2010) from six monitoring stations of Veli-Akkulam Lake and compared with a regional reference lake Vellayani of south India. Seasonal variations of 14 differen...
متن کاملSparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملThe Effect of Teaching based on the Four-Component Instructional Design Model on the Students’ Learning in Physiology
Introduction: Different models of instructional design play an important role in improving student learning and education. The purpose of this study was to determine the effect of four-component instructional design model on students’ learning in physiology course. Method: The present study was a semi-experimental design with pre-test, post-test and control group. The statistical population of ...
متن کاملDesign and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer and Information Science
دوره 4 شماره
صفحات -
تاریخ انتشار 2011